Code
from inspect_viz import Data
from inspect_viz.view.beta import tool_calls
tools = Data.from_file("cybench_tools.parquet")
tool_calls(tools)Dataset: cybench_tools.parquet
In this example we visualize tool usage over a series of turns in a Cybench evaluation. We use a cell() mark to visualize tool use over messages in each sample of an evaluation. We note any limit that ended the sample using a text() mark on the right side of the frame.
Code
See the documentation on the tool_calls() function for details on the data it requires as well as customizing varioius aspects of the plot. If you are curious about how the plot was implemented, read on below.
Here is an annotated version of the code required to produce the tool call plot above (click on the numbers in the right margin for additional explanation).
Code
from inspect_viz import Data
from inspect_viz.plot import plot, legend
from inspect_viz.mark import cell, text
# read data (see 'Data Preparation' below)
data = Data.from_file("cybench_tools.parquet")
tools = ["bash", "python", "submit"]
plot(
cell(
data,
x="order",
y="id",
fill="tool_call_function"
),
text(
data,
text="limit",
y="id",
frame_anchor="right",
font_size=8,
font_weight=200,
dx=50
),
legend=legend("color", location="right"),
margin_top=0,
margin_left=20,
margin_right=100,
x_ticks=list(range(0, 400, 80)),
y_ticks=[],
x_label="Message",
y_label="Sample",
color_label="Tool",
color_domain=tools
)cell() mark showing tool calls.
text() mark showing whether the sample terminated due to a limit.
To create the plot we read a raw messages data frame from an eval log1 then filter down to just the fields we require for visualization:
Note that the trimming of columns is particularly important because Inspect Viz embeds datasets directly in the web pages that host them (so we want to minimize their size for page load performance and bandwidth usage).
The eval log read for this example is in the inspect-viz-example-logs repo↩︎